Distributed caching systems and methods

ABSTRACT

Example distributed caching systems and methods are described. In one implementation, a system has multiple host systems, each of which includes a cache resource that is accessed by one or more consumers. A management server is coupled to the multiple host systems and presents available cache resources and resources associated with available host systems to a user. The management server receives a user selection of at least one available cache resource and at least one host system. The selected host system is then configured to share the selected cache resource.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser.No. 62/037,010, entitled “Distributed Caching Systems and Methods,”filed Aug. 13, 2014, the disclosure of which is incorporated herein byreference in its entirety

TECHNICAL FIELD

The present disclosure relates to systems and methods that distributecache resources across multiple host computing systems.

BACKGROUND

Existing computing systems use cache resources to improve systemperformance. Typically, each cache is associated with a specificcomputing system. In computing environments with multiple computingsystems (and multiple associated caches), some caches are not utilized(e.g., when the associated computing system is not active) while othercaches are operating at maximum capacity due to an active computingsystem using the cache. This situation results in an inefficient usageof available cache resources and reduces the overall performance of thecomputing environment. For example, in virtual environments, virtualmachines typically move between different computing systems, whichinvalidates the virtual machine's previous cache.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments of the present disclosureare described with reference to the following figures, wherein likereference numerals refer to like parts throughout the various figuresunless otherwise specified.

FIG. 1 is a block diagram depicting an environment within which anexample embodiment may be implemented.

FIG. 2 is a flow diagram depicting additional details regarding theenvironment within which an example embodiment may be implemented.

FIG. 3 is a flow diagram depicting an embodiment of a method formanaging shared cache resources.

FIG. 4 is a flow diagram depicting an embodiment of a method for addinga device to an acceleration tier.

FIG. 5 is a flow diagram depicting an embodiment of a method forremoving a device from an acceleration tier.

FIGS. 6A and 6B depict a flow diagram illustrating an embodiment of amethod for adding a virtual machine to an acceleration tier and managinga dynamic policy.

FIG. 7 is a flow diagram depicting an embodiment of a method forremoving a virtual machine from an acceleration tier.

FIGS. 8A and 8B depict a flow diagram illustrating an embodiment of amethod for adding a datastore to an acceleration tier.

FIG. 9 is a flow diagram depicting an embodiment of a method forremoving a datastore from an acceleration tier.

FIGS. 10A and 10B depict a flow diagram illustrating an embodiment of amethod for automatic WriteBack peer selection and automatic destaging ofwriteback data by a peer based on a failure of a primary host.

FIG. 11 is a flow diagram depicting an embodiment of a method forcollecting performance and usage data for acceleration tiers.

FIG. 12 is a flow diagram depicting an embodiment of a method forreporting performance and usage data.

FIGS. 13-26 illustrate various example screen shots related to agraphical user interface (GUI) that allows a user to configure, manage,and modify shared cache resources, computer systems, and othercomponents discussed herein.

FIG. 27 is a block diagram depicting an example computing device.

DETAILED DESCRIPTION

In the following description, reference is made to the accompanyingdrawings that form a part thereof, and in which is shown by way ofillustration specific exemplary embodiments in which the disclosure maybe practiced. These embodiments are described in sufficient detail toenable those skilled in the art to practice the concepts disclosedherein, and it is to be understood that modifications to the variousdisclosed embodiments may be made, and other embodiments may beutilized, without departing from the scope of the present disclosure.The following detailed description is, therefore, not to be taken in alimiting sense.

Reference throughout this specification to “one embodiment,” “anembodiment,” “one example,” or “an example” means that a particularfeature, structure, or characteristic described in connection with theembodiment or example is included in at least one embodiment of thepresent disclosure. Thus, appearances of the phrases “in oneembodiment,” “in an embodiment,” “one example,” or “an example” invarious places throughout this specification are not necessarily allreferring to the same embodiment or example. Furthermore, the particularfeatures, structures, databases, or characteristics may be combined inany suitable combinations and/or sub-combinations in one or moreembodiments or examples. In addition, it should be appreciated that thefigures provided herewith are for explanation purposes to personsordinarily skilled in the art and that the drawings are not necessarilydrawn to scale.

Embodiments in accordance with the present disclosure may be embodied asan apparatus, method, or computer program product. Accordingly, thepresent disclosure may take the form of an entirely hardware-comprisedembodiment, an entirely software-comprised embodiment (includingfirmware, resident software, micro-code, etc.), or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module,” or “system.” Furthermore,embodiments of the present disclosure may take the form of a computerprogram product embodied in any tangible medium of expression havingcomputer-usable program code embodied in the medium.

Any combination of one or more computer-usable or computer-readablemedia may be utilized. For example, a computer-readable medium mayinclude one or more of a portable computer diskette, a hard disk, arandom access memory (RAM) device, a read-only memory (ROM) device, anerasable programmable read-only memory (EPROM or Flash memory) device, aportable compact disc read-only memory (CDROM), an optical storagedevice, and a magnetic storage device. Computer program code forcarrying out operations of the present disclosure may be written in anycombination of one or more programming languages. Such code may becompiled from source code to computer-readable assembly language ormachine code suitable for the device or computer on which the code willbe executed.

Embodiments may also be implemented in cloud computing environments. Inthis description and the following claims, “cloud computing” may bedefined as a model for enabling ubiquitous, convenient, on-demandnetwork access to a shared pool of configurable computing resources(e.g., networks, servers, storage, applications, and services) that canbe rapidly provisioned via virtualization and released with minimalmanagement effort or service provider interaction and then scaledaccordingly. A cloud model can be composed of various characteristics(e.g., on-demand self-service, broad network access, resource pooling,rapid elasticity, and measured service), service models (e.g., Softwareas a Service (“SaaS”), Platform as a Service (“PaaS”), andInfrastructure as a Service (“IaaS”)), and deployment models (e.g.,private cloud, community cloud, public cloud, and hybrid cloud).

The flow diagrams and block diagrams in the attached figures illustratethe architecture, functionality, and operation of possibleimplementations of systems, methods, and computer program productsaccording to various embodiments of the present disclosure. In thisregard, each block in the flow diagrams or block diagrams may representa module, segment, or portion of code, which comprises one or moreexecutable instructions for implementing the specified logicalfunction(s). It will also be noted that each block of the block diagramsand/or flow diagrams, and combinations of blocks in the block diagramsand/or flow diagrams, may be implemented by special purposehardware-based systems that perform the specified functions or acts, orcombinations of special purpose hardware and computer instructions.These computer program instructions may also be stored in acomputer-readable medium that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablemedium produce an article of manufacture including instruction meanswhich implement the function/act specified in the flow diagram and/orblock diagram block or blocks.

The systems and methods described herein distribute cache resourcesacross multiple computer systems. In some embodiments, these systems andmethods virtualize cache resources (such as Flash and RAM (Random AccessMemory) devices) across multiple computer systems to create a clusteredpool of high-speed cache resources, which accelerate data read and writeoperations to shared storage systems. In particular embodiments, thecache resources include one or more RAM devices. Thus, each computersystem may access multiple cache resources in any number of othercomputer systems, rather than being limited to the cache resourceswithin the particular computer system. As discussed herein, a user canconfigure any cache resource to be shared by any number of computersystems.

Although specific examples discussed herein refer to computer systemsthat support multiple virtual machines, and acceleration tiers (alsoreferred to as FVP clusters) that contain multiple computer systems,alternate embodiments may use any type of computing architecture,components, and algorithms to implement the systems and methodsdescribed herein. Further, the described systems and methods areapplicable to the acceleration of any type of storage device, such asflash memory, hard drives, solid state drives, and any other memory ordata storage device.

FIG. 1 is a block diagram depicting an environment 100 within which anexample embodiment may be implemented. Environment 100 includes a FVPgraphical user interface (GUI) 102, which allows users to interact witha FVP management server 104, for example, to configure the operation ofvarious components and processes in an acceleration tier 106.Acceleration tier 106 may also be referred to as a FVP cluster.Acceleration tier 106 includes any number of acceleration resources 108and any number of acceleration consumers 110. Acceleration resources 108include systems and methods that accelerate, for example, the processingof data. Acceleration consumers 110 include systems and methods thatbenefit from, for example, the accelerated processing of data.Additional details regarding acceleration resources 108 and accelerationconsumers 110 are discussed herein.

FIG. 2 is a flow diagram depicting additional details regardingenvironment 100 within which an example embodiment may be implemented.As shown in FIG. 2, FVP cluster (also referred to as an accelerationtier) 106 includes any number of computer systems 202 and 204 coupled toa storage system 240. Storage system 240 may represent any type of datastorage architecture and may include any number of physical storagedevices. In some embodiments, storage system 240 is implemented as astorage area network (SAN). In other embodiments, storage system 240includes a local disk, a network file system (NFS), a network attachedstorage (NAS), and an object store. As discussed herein, a storagesystem may be referred to generally as a “datastore.”

Each computer system 202, 204 implements multiple virtual machines(VMs). As shown in FIG. 2, computer system 202 implements VMs 206, 208,210, and 212, while computer system 204 implements VMs 220, 222, 224,and 226. Computer system 202 also includes a host management module 214,a hypervisor 216, and a cache 218. As discussed herein, cache 218 may beshared by any number of computer systems in FVP cluster 106. Hostmanagement module 214 communicates with FVP management server 104 toreceive instructions for sharing cache 218 with other computer systemsin FVP cluster 106. Hypervisor 216 contains a cache manager 236 thatmanages data in cache 218 with respect to the VM 10 stream as observedvia hypervisor 216. Cache manager 236 functions to create a clustered,distributed cache of storage acceleration resources. As shown in theembodiment of FIG. 2, hypervisor 216 includes the cache manager 236 andhypervisor 230 includes a similar cache manager 238. Cache managers 236and 238 support, for example, various distributed caching operationsdiscussed herein. Cache 218 caches data within a local memory (alsoreferred to as a “cache memory”), such as Flash memory or RAM.

When an application executing on one of the VMs 206-212 generates a dataread or a data write operation, the application sends the operation tohost management module 214. In some embodiments, the data read or datawrite operation specifies a location in storage system 240. For example,with a data read operation, the FVP cluster checks the local cache(e.g., cache 236 or 238) to see if the data is cached. If the data iscached, the requested data is read from the local cache withoutrequiring any communication with storage system 240. If the data is notcached, the requested data is retrieved from storage system 240 andwrites the data to the cache for future reference. For data writeoperations, the IOs are sent directly to the local cache and theoperation is acknowledged to the appropriate VM. Asynchronously, at alater time, the write IO is written to storage system 240 for long-termstorage. In some embodiments, cache 218 caches the data record and thespecified storage location. As discussed in greater detail herein, in aWriteBack cache, the generated data write operations are subsequentlysent to storage system 240.

Computer system 204 includes a host management module 228 that issimilar to host management module 214, a hypervisor 230 that is similarto hypervisor 216, and a cache 232 that is similar to cache 218. Acommunication link 234 indicates a logical connection between cachemanagers 236 and 238. Communication link 234 allows all cache managersin FVP cluster 106 to communicate with one another. This communicationincludes information related to the sharing of cache resources among allof the computer systems in FVP cluster 106. Communication link 234 mayalso be thought of as a software layer that allows multiple cachemanagers to communicate with one another.

FIG. 3 is a flow diagram depicting an embodiment of a method 300 formanaging shared cache resources. Initially, a FVP management serveridentifies available cache resources in a FVP cluster at 302. Theidentified cache resources include, for example, available cacheresources in the multiple computer systems within the FVP cluster. TheFVP management server displays the available cache resources to a userthrough a GUI at 304, such as GUI 102 in FIG. 1. The user selects one ormore of the available cache resources at 306. The FVP management serverthen identifies available consumers in the FVP cluster at 308 anddisplays the available consumers to the user through a GUI at 310. Theuser selects one or more consumers in the FVP cluster to share theselected cache resources at 312. Finally, the FVP management serverconfigures the selected consumers to share the selected cache resourcesat 314.

In some embodiments, method 300 further includes asking the user whetherto configure the selected consumers (VMs) with WriteThrough policy orWriteBack policy. If a consumer (VM) is configured with WriteBackpolicy, then the FVP management server identifies backup caches locatedin computer systems that are different from the computer system on whichthe VM resides. For example, if a particular cache resource is selectedas a WriteBack cache, then one or more backup caches are identified suchthat the backup caches are located in computer systems (referred to as“peers”) that are different from the computer system on which theparticular cache resides. Thus, the user selects a desired level ofredundancy, then the systems and methods described herein automaticallyperform the operations necessary to implement the desired level ofredundancy. As used herein, a peer is a host (e.g., any computer system)that contains acceleration resources used by another host to storebackup copies of WriteBack data. In some embodiments, the backup cachesare selected automatically (e.g., by the FVP management server or acomponent within a computer system). In other embodiments, the backupcaches are selected by the user after being presented with a listing ofavailable backup caches.

FIG. 4 is a flow diagram depicting an embodiment of a method 400 foradding a device to an acceleration tier. Method 400 begins as a userinitiates an operation to add a device to an acceleration tier at 402.Method 400 identifies all computer systems in the acceleration tier at404 and identifies all devices associated with each of those computersystems at 406. A first device is selected at 408 and method 400determines whether the device is a local SSD or PCIE flash card (or someother device, such as RAM) at 410. If the device is not a local SSD orPCIE flash card, method 400 selects the next device at 412.

If the device is a local SSD or PCIE flash card, method 400 determineswhether the device is empty or unformatted at 414. If the device isneither empty nor unformatted, the next device is selected at 412. Ifthe device is empty or unformatted, the method determines whether thedevice already contains a FVP partition at 414. If a FVP partition isdetected, the next device is selected at 412. If no FVP partition isdetected, the device is added to a list of eligible devices at 418. Ifadditional devices remain to be analyzed at 420, method 400 returns to412 to select the next device. After all devices are analyzed, method400 presents a list of eligible and non-eligible devices at 422 to theuser via the user interface. For the selected eligible device(s), method400 formats the eligible device(s) on the computer system and makes thedevice(s) ready for acceleration at 424.

FIG. 5 is a flow diagram depicting an embodiment of a method 500 forremoving a device from an acceleration tier. Method 500 begins as a userinitiates an operation to remove a device from an acceleration tier at502. The method also identifies a device selected by the user forremoval from the acceleration tier at 504. Method 500 continues bydetermining whether the device contains WriteBack data at 506. If thedevice does not contain WriteBack data, the device is unformatted andthe FVP partition is deleted at 516.

However, if the device contains WriteBack data, method 500 disablesWriteBack for all virtual machines that are actively writing data to thedevice at 508. Method 500 starts destaging WriteBack data to the sharedstorage at 510 and waits for destaging to complete for all virtualmachines at 512. Method 500 also waits for all virtual machines totransition to WriteThrough policy at 514, then unformats the device anddeletes the FVP partition at 516.

FIGS. 6A and 6B depict a flow diagram illustrating an embodiment of amethod 600 for adding a virtual machine to an acceleration tier. Method600 begins as a user initiates an operation to add a virtual machine toan acceleration tier at 602. Method 600 identifies all computer systemsin the acceleration tier at 604 and identifies all virtual machineobjects associated with each of the computer systems at 606. A firstvirtual machine object is selected at 608. Method 600 then determineswhether the virtual machine is stored on a supported datastore at 610.If the virtual machine is stored on a supported datastore, the methodnext determines whether it is a blacklisted virtual machine at 612. Eachacceleration tier contains a blacklist of consumers (virtual machines)that will not be accelerated, even if they reside on an object (e.g.,datastore) that has been added to the acceleration tier. If the virtualmachine is not blacklisted, it is added to a list of supported virtualmachines at 614.

Method 600 continues by determining whether additional virtual machineobjects remain to be analyzed at 616. If additional virtual machineobjects remain, method 600 selects the next virtual machine object at618 and returns to 610 to analyze the newly selected virtual machineobject.

After all virtual machine objects are analyzed, a list of supported andnon-supported virtual machines is presented to a user via a userinterface at 620. For the selected (and supported) virtual machine,method 600 writes the user-selected write policy on the datastore at622. Method 600 continues by determining whether the virtual machine ispowered on at 624. If the virtual machine is not powered on, the methodends. If powered on, the method 600 continues by determining whether thevirtual machine's write policy is WriteBack at 626. If not, the virtualmachine is put into WriteThrough mode at 628. If the virtual machine'swrite policy is WriteBack, method 600 next determines whether thevirtual machine policy has WriteBack peers at 630. If not, the virtualmachine is put into WriteThrough mode at 628.

If the virtual machine policy has WriteBack peers, method 600 determineswhether the computer system has sufficient peer hosts (e.g., computersystems) selected at 632. If not, the virtual machine is put intoWriteThrough mode at 628. If the computer system has sufficient peerhosts selected, method 600 puts the virtual machine into WriteBack modeat 634.

FIG. 6B also illustrates a method for handling movement of a virtualmachine between two different computer systems. In particular, method600 detects that a virtual machine has moved to a different computersystem at 636. Method 600 determines the virtual machine's write policyfrom its parent datastore at 638, then continues at 624 in the mannerdiscussed above. For example, if a virtual machine moves from one hostto another host, the FVP management server reads the virtual machine'swrite policy from its parent datastore. In some embodiments, the samedatastore is accessible from multiple hosts such that the multiple hostsare reading from the same datastore.

FIG. 7 is a flow diagram depicting an embodiment of a method 700 forremoving a virtual machine from an acceleration tier. Method 700 beginsas a user initiates an operation to remove a virtual machine from anacceleration tier at 702. The method identifies a particular virtualmachine selected by the user for removal from the acceleration tier at704 and determines whether the virtual machine has a WriteBack policy at706. If the virtual machine does not have a WriteBack policy, thevirtual machine is removed from the acceleration tier at 716.

If the virtual machine has a WriteBack policy, method 700 prepares thevirtual machine to transition to a WriteThrough policy at 708 and startsdestaging the remaining data to shared storage at 710. Method 700 thenwaits for destaging to complete for the virtual machine at 712 and waitsfor the virtual machine to transition to WriteThrough policy at 714. Thevirtual machine is then removed from the acceleration tier at 716.

FIGS. 8A and 8B depict a flow diagram illustrating an embodiment of amethod 800 for adding a datastore to an acceleration tier. Method 800starts as a user initiates an operation to add a datastore to anacceleration tier at 802. The method continues by identifying alldatastores at 804 and selecting a first datastore object at 806. Method800 determines whether the datastore is supported at 808. If so, thedatastore is added to a list of supported datastores at 810. The methodthen determines whether the last datastore has been identified at 812.If not, the next identified datastore is selected at 814 and method 800returns to 808 to determine whether the next identified datastore issupported.

After all datastores have been identified, a list of supported andunsupported datastores are presented to the user via a user interface at816. For the selected supported datastores, method 800 changes thedatastore write policy on one computer system at 818. Additionally,method 800 updates datastore write policies on all other computersystems from which the datastore is accessible at 820. Thus, the FVPmanagement server allows the user to select the write policy(WriteThrough or WriteBack) and writes the selected write policy to theappropriate datastore.

As method 800 continues, for every virtual machine in the datastore at822, the method determines whether the virtual machine is powered on at824. If not, the next virtual machine in the datastore is selected. Ifthe virtual machine is powered on, the method determines whether thewrite policy for the virtual machine is WriteBack at 826. If not, thevirtual machine is put into WriteThrough mode at 828. If the virtualmachine is requesting WriteBack mode, method 800 determines whether thevirtual machine write policy has WriteBack peers at 830. If yes, themethod determines whether the computer system has sufficient peer hostsselected at 832. Method 800 then puts the virtual machine into WriteBackmode at 834 and returns to 822 to select the next virtual machine. Thus,the above activity applies a datastore's write policy to all virtualmachines in that datastore.

FIG. 9 is a flow diagram depicting an embodiment of a method 900 forremoving a datastore from an acceleration tier. Method 900 starts as auser initiates an operation to remove a datastore from an accelerationtier at 902. Method 900 identifies a datastore selected by the user forremoval from the acceleration tier at 904 and determines whether thedatastore contains any virtual machines with a WriteBack policy at 906.If not, method 900 updates the datastore's write policy to “none” on allcomputer systems that can access the datastore at 916.

If the datastore contains at least one virtual machine with a WriteBackpolicy, method 900 disables WriteBack for all virtual machines that areactively writing data to the datastore being removed at 908. Method 900starts destaging the remaining data to the shared datastore at 910 andwaits for destaging to complete for all virtual machines at 912. Method900 also waits for all virtual machines to transition to WriteThroughpolicy at 914, then updates the datastore's write policy to “none” onall computer systems that can access the datastore at 916.

FIGS. 10A and 10B depict a flow diagram illustrating an embodiment of amethod 1000 for automatic WriteBack peer selection and automaticdestaging of WriteBack data by a peer based on a failure of a primaryhost. Method 1000 starts as a triggering event is detected at 1002.Example triggering events include a flash device failure on one or morecomputer systems, a complete computer system crash (or a FVP agent notrunning), or a computer system losing accessibility to a datastore. Themethod identifies virtual machines with WriteBack policy that areaffected by the triggering event at 1004. Method 1000 starts destagingdata from peer hosts (e.g., computer systems) that are online and haveaccessibility to the datastore at 1006. After completion of destaging at1008, each virtual machine is transitioned to WriteThrough policy at1010.

Method 1000 continues by identifying a list of computer systems in theFVP cluster and sorts the list by computer system name at 1012. Thisstep, along with the subsequent series of steps, are performed toreassign peer hosts after a failure occurs, which ensures that each hosthas two functioning peers. The method then arranges the list of computersystems at 1014. In some embodiments, the computer systems are arrangedin a logical ring, where each computer system in a list has a member tothe left and a member to the right based on its position in the list.The first computer system in the list uses the last computer system asits “left” member and the last computer system in the list uses thefirst computer system as its “right” member, thereby forming a ring. Foreach computer system in the list, the remaining steps shown in FIGS. 10Aand 10B are performed at 1016 with respect to each computer system.

If the particular computer system has an appropriate number of peersselected at 1018, for each virtual machine with WriteBack policy, themethod verifies at 1020 that sufficient peers exist and puts the virtualmachine in WriteBack policy. If the particular system does not have anappropriate number of peers selected at 1018, the method continues byselecting a potential peer computer system at 1022. In some embodiments,the potential peer computer system selected is the computer system tothe right in the ring configuration. If that computer system cannot beused as a peer (for one of the reasons discussed below) or if only onepeer has been selected, then the next peer to the right will beselected. This continues until two peers are selected for the computersystem. In other embodiments, any number of peers may be selected for aparticular computer system.

If the computer system is not available or a FVP host is running at1024, the method returns to select the next computer system at 1022. Ifthe computer system is available and a FVP host is running at 1024, themethod determines whether the computer system is using FVP accelerationat 1026. If the computer system is not using FVP acceleration, themethod returns to 1022.

If the computer system is using FVP acceleration at 1026, method 1000determines whether the computer system is in maintenance mode (ordisconnected from a vCenter server) at 1028. If the computer system isnot in maintenance mode, the method returns to 1022. If the computersystem is in maintenance mode, the method continues to 1030 anddetermines whether the computer system has a valid FVP network selected.If the computer system does not have a valid FVP network selected, themethod returns to 1022. If the computer system has a valid FVP networkselected, method 1000 determines whether the computer system has anacceleration resource added to the FVP cluster at 1032. If the computersystem has an acceleration resource added to the FVP cluster, thecomputer system is set as an eligible peer host at 1034, and method 1000returns to 1018 to determine if the computer system has an appropriatenumber of peers selected. Otherwise, method 1000 returns to 1022 toselect the next computer system.

As discussed above with respect to FIGS. 10A and 10B, method 1000provides for automatic WriteBack peer selection and automatic destagingof WriteBack data by a peer based on a failure of a primary host.Additionally, method 1000 can provide for the initial installation andconfiguration of a system utilizing one or more WriteBack peers. Forexample, when installing/configuring a new system, method 1000 may beginat 1012 and perform an initial selection of peers. After this initialselection and configuration is completed, method 1000 continuesmonitoring the systems for triggering events, as discussed above.

FIG. 11 is a flow diagram depicting an embodiment of a method 1100 forcollecting performance and usage data for acceleration tiers. Initially,method 1100 identifies all acceleration tiers at 1102, and identifiesall virtual machines and flash devices in each of the acceleration tiersat 1104. Method 1100 continues by identifying performance and usage datafrom the computer systems in the acceleration tiers at 1106. The methodcomputes performance data (e.g., IOs per second, latency, andthroughput) for all virtual machines and flash devices in the FVPclusters at 1108.

For virtual machines, in addition to the overall effective performance(as seen by the virtual machine), data is collected about theperformance of the different types of storage it is using. This includesflash used from the local computer system, flash used from computersystems that reside somewhere on the network, and the underlyingpersistent datastore. Additionally, the performance of Read IOs andWrite IOs are tracked. Finally, the cross-section of those items is alsocollected (e.g., “Local Flash Read”, or “Datastore Write”). For flashdevices, in addition to the overall effective performance (as seen bythe flash device), data is collected about the performance of VM ReadIOs and VM Write IOs, as well as IOs done by the system for its owninternal purposes.

Usage data is computed for all virtual machines and flash devices at1110. Usage data includes the amount of acceleration resources that aparticular virtual machine is using. This usage data can be reported atmultiple levels, such as the aggregate usage across all hosts, the usageacross an entire network, the local usage, and the amount used forWriteBack destaging. Method 1100 continues by aggregating theperformance and usage data for an entire acceleration tier at 1112 andaggregating the performance and usage data for each computing system at1114. Performance and usage data associated with individual virtualmachines and individual flash devices is stored at 1116, and aggregatedperformance and usage data associated with each acceleration tier isstored at 1118.

FIG. 12 is a flow diagram depicting an embodiment of a method 1200 forreporting performance and usage data. Initially, method 1200 identifiesa FVP cluster, virtual machine or flash device at 1202 and identifies adesired time range at 1204. The identified FVP cluster, virtual machine,flash device or time range may be identified (or defined) by a userdesiring a report detailing system performance and usage. Method 1200accesses performance and usage data for the FVP cluster, virtual machineor flash device during the identified time range at 1206. A report (oruser interface display) is generated containing the accessed performanceand usage data during the identified time range at 1208.

In specific embodiments, the performance data includes aggregated dataat the FVP cluster level, at the individual computer system level, atthe individual virtual machine level or at the individual flash devicelevel. The user interface for defining and generating performancereports allows the user to define time ranges such as “last 10 minutes”or custom time ranges selected by the user. The user interface alsopresents options for choosing performance data related to IOPs, Latencyor Throughput.

FIGS. 13-26 illustrate various example screen shots related to agraphical user interface (GUI) that allows a user to configure, manage,and modify shared cache resources, computer systems, and othercomponents discussed herein. In particular, FIGS. 13 and 14 illustrateGUI screens related to the creation of a new flash cluster. As usedherein, a “flash cluster” is substantially equivalent to an accelerationtier or “FVP cluster.” While a flash cluster typically includes flashdevices, an FVP cluster may include flash devices, RAM devices, and anyother type of memory or storage device. FIG. 15 illustrates a GUI screenrelated to selecting one or more flash devices.

FIGS. 16 and 17 illustrate a GUI screen related to adding datastores.FIG. 17 also displays selection of a particular write policy, such asWriteThrough or WriteBack. If the WriteThrough policy is selected, writeoperations are written directly to the storage system and also to cachememory—so these write operations are not accelerated. However, readoperations are accelerated. If the WriteBack policy is selected, thedatastore (or virtual machine) is operating in an accelerated mode (forboth data read operations and data write operations), as discussedherein. The operation is accelerated because the virtual machine writesthe data to the cache, then moves on to the next operation withoutwaiting for the data to be written to the storage system. When operatingin WriteBack mode, data stored to the cache is written asynchronously tothe storage system.

FIG. 18 displays selection of a specific write redundancy, such as localflash only, local flash and one network flash device or local flash andtwo network flash devices. This write redundancy is necessary since datawritten to the cache is not immediately written to the storage system(e.g., storage system 240 in FIG. 2). Thus, if the cache fails beforethe data is written to the storage system, there is no backup copy ofthe data. To avoid this problem, the user has options to back up thedata to one or two network flash devices located on different computersystems. If the primary cache fails before the data is written to thestorage system, one of the backup flash devices will write the data tothe storage system instead. In some embodiments, if there are not enoughbackup devices available to support the selected WriteBack policy, thesystem may automatically limit the user's selections, or change existingselections to protect the data. For example, if there are no backupdevices available, the system may automatically change the virtualmachine's policy to WriteThrough until additional backup devices areavailable to support the WriteBack policy. In some embodiments, afterthe cache data is successfully written to the storage system, the backupdata stored in the backup devices is lazily deleted as the backupdevices need the storage space occupied by the old backup data. Thisapproach of lazily deleting the backup data reduces pressure on thebackup devices and extends the device's lifetime.

FIG. 19 illustrates a GUI screen related to successful creation of aflash cluster. FIG. 20 illustrates three devices selected by a user toinclude in the flash cluster. FIGS. 21-26 illustrate GUI screens relatedto displaying various configuration, status, and performance dataassociated with any number of flash clusters, VMs, flash devices,computer systems, storage systems, and the like. For example, FIG. 21shows the operating status of various hosts and flash devices. In someembodiments, hosts and flash devices illustrated using green boxes areoperating properly. Other colors (e.g., red or yellow) indicate aproblem, potential problem or failure of the associated host or flashdevice. A user can hover a pointer over a host or flash device to seemore details regarding the current status. The GUI screens can alsodisplay performance information, such as VM IOPS, VM throughput, VMlatency, flash hit rate, flash eviction rate, IOs saved from datastore,Datastore bandwidth saved, and writes accelerated.

In some embodiments, GUI screens display various examples of statusreporting, such as write back status, synchronization status, and thelike. For example, “destaging status” indicates the amount of dataremaining to be written to a storage system. For a peer, the GUIdisplays “Ready in case of failure,” indicating the data has beensuccessfully replicated. Possible device states include “read peer,”“write peer,” and “primary.” A “primary” state indicates that the deviceis currently responsible for committing WriteBack data to the storagesystem. A “read peer” state occurs when a virtual machine migrates to anew computer system, but it continues accessing the read cache on theold computer system until the read cache is fully populated on the newcomputer system.

FIGS. 23-26 illustrate GUI screens related to performance charts for theFVP cluster, virtual machine, and a flash device. In particular, FIG. 26illustrates a performance map for an entire FVP cluster. The largerboxes represent a single computer system. The individual boxes withinthat larger box (and delineated by white lines) represent virtualmachines on that computer system. For example, the upper-left handcorner of FIG. 26 shows a group of boxes associated with the samecomputer system. The individual boxes represent virtual machines on thatcomputer system and the size of each box is determined based on eachvirtual machine's latency. Since the box in the lower-left has thelargest size in that group, it has the highest latency. Since thelower-right box is the smallest, it has the lowest latency.

FIG. 27 is a block diagram depicting an example computing device 2700.Computing device 2700 may be used to perform various procedures, such asthose discussed herein. Computing device 2700 can function as a server,a client or any other computing entity. For example, computing device2700 may function as FVP management server 104 (FIG. 1) or one of thecomputer systems shown in FIG. 2. Computing device 2700 can be any of awide variety of computing devices, such as a desktop computer, anotebook computer, a server computer, a handheld computer, a tablet, andthe like.

Computing device 2700 includes one or more processor(s) 2702, one ormore memory device(s) 2704, one or more interface(s) 2706, one or moremass storage device(s) 2708, and one or more Input/Output (I/O)device(s) 2710, all of which are coupled to a bus 2712. Processor(s)2702 include one or more processors or controllers that executeinstructions stored in memory device(s) 2704 and/or mass storagedevice(s) 2708. Processor(s) 2702 may also include various types ofcomputer-readable media, such as cache memory.

Memory device(s) 2704 include various computer-readable media, such asvolatile memory (e.g., random access memory (RAM)) and/or nonvolatilememory (e.g., read-only memory (ROM)). Memory device(s) 2704 may alsoinclude rewritable ROM, such as Flash memory.

Mass storage device(s) 2708 include various computer readable media,such as magnetic tapes, magnetic disks, optical disks, solid statememory (e.g., Flash memory), and so forth. Various drives may also beincluded in mass storage device(s) 2708 to enable reading from and/orwriting to the various computer readable media. Mass storage device(s)2708 include removable media and/or non-removable media.

I/O device(s) 2710 include various devices that allow data and/or otherinformation to be input to or retrieved from computing device 2700.Example I/O device(s) 2710 include cursor control devices, keyboards,keypads, microphones, monitors or other display devices, speakers,printers, network interface cards, modems, lenses, CCDs or other imagecapture devices, and the like.

Interface(s) 2706 include various interfaces that allow computing device2700 to interact with other systems, devices, or computing environments.Example interface(s) 2706 include any number of different networkinterfaces, such as interfaces to local area networks (LANs), wide areanetworks (WANs), wireless networks, and the Internet.

Bus 2712 allows processor(s) 2702, memory device(s) 2704, interface(s)2706, mass storage device(s) 2708, and I/O device(s) 2710 to communicatewith one another, as well as other devices or components coupled to bus2712. Bus 2712 represents one or more of several types of busstructures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, andso forth.

For purposes of illustration, programs and other executable programcomponents are shown herein as discrete blocks, although it isunderstood that such programs and components may reside at various timesin different storage components of computing device 2700, and areexecuted by processor(s) 2702. Alternatively, the systems and proceduresdescribed herein can be implemented in hardware, or a combination ofhardware, software, and/or firmware. For example, one or moreapplication specific integrated circuits (ASICs) can be programmed tocarry out one or more of the systems and procedures described herein.

Although the present disclosure is described in terms of certainpreferred embodiments, other embodiments will be apparent to those ofordinary skill in the art, given the benefit of this disclosure,including embodiments that do not provide all of the benefits andfeatures set forth herein, which are also within the scope of thisdisclosure. It is to be understood that other embodiments may beutilized, without departing from the scope of the present disclosure.

1. An apparatus comprising: a plurality of acceleration tiers, whereineach acceleration tier includes a plurality of host systems, and whereineach of the plurality of host systems includes a cache and a pluralityof virtual machines; and a management server coupled to the plurality ofacceleration tiers, the management server configured to: monitorperformance data associated with each of the plurality of host systems,the performance data including information associated with theperformance of each of the plurality of virtual machines and each cachein the plurality of host systems; and aggregate the performance data fora particular host system, wherein the aggregated performance data forthe particular host system includes information associated with theperformance of virtual machines within the particular host system andthe performance of the cache in the particular host system.
 2. Theapparatus of claim 1, wherein the management server is furtherconfigured to monitor performance data during a specified time period.3. The apparatus of claim 1, wherein the performance data includes atleast one of IOPS, latency and throughput.
 4. The apparatus of claim 1,wherein the management server is further configured to display theperformance data to a user through a graphical user interface.
 5. Theapparatus of claim 1, wherein the management server is furtherconfigured to generate a report containing the aggregated performancedata for the particular host system.
 6. The apparatus of claim 1,wherein the management server is further configured to aggregateperformance data from the plurality of host systems.
 7. The apparatus ofclaim 6, wherein the management server is further configured to generatea report containing the aggregated performance data for the plurality ofhost systems.
 8. The apparatus of claim 1, wherein the management serveris further configured to aggregate performance data associated with aplurality of storage devices in the plurality of acceleration tiers. 9.The apparatus of claim 1, wherein the management server is furtherconfigured to generate a report containing usage data for each virtualmachine associated with a particular storage device.
 10. The apparatusof claim 1, wherein the management server is further configured togenerate a report containing usage data for each storage device in theplurality of acceleration tiers.
 11. The apparatus of claim 1, whereinthe management server is further configured to monitor read IOs andwrite IOs.
 12. The apparatus of claim 1, wherein the management serveris further configured to monitor performance of a local cache,performance of a network cache, and performance of a datastore.
 13. Theapparatus of claim 1, wherein the management server is furtherconfigured to monitor performance of local flash IOs and network flashIOs.
 14. The apparatus of claim 1, wherein the management server isfurther configured to monitor virtual machine read IOs and virtualmachine write IOs.
 15. The apparatus of claim 1, wherein the managementserver is further configured to monitor local read IOs by a flashdevice, local read IOs by a flash device, network read IOs by a flashdevice, network write IOs by a flash device, datastore read IOs, anddatastore write IOs.
 16. A method comprising: identifying, using one ormore processors, a plurality of acceleration tiers, wherein eachacceleration tier includes a plurality of host systems, and wherein eachof the plurality of host systems includes a cache and a plurality ofvirtual machines; monitoring, using the one or more processors,performance data associated with each of the plurality of host systems,wherein the performance data includes information associated with theperformance of each of the plurality of virtual machines and each cachein the plurality of host systems; and aggregating, using the one or moreprocessors, the performance data for a particular host system, whereinthe aggregated performance data for the particular host system includesinformation associated with the performance of virtual machines withinthe particular host system and the performance of the cache in theparticular host system.
 17. The method of claim 16, wherein themonitoring of performance data occurs during a specified time period.18. The method of claim 16, wherein the performance data includes atleast one of IOPS, latency and throughput.
 19. The method of claim 16,further comprising generating a report containing the aggregatedperformance data for the particular host system.
 20. The method of claim16, further comprising aggregating performance data associated with aplurality of storage devices in the plurality of acceleration tiers.